Purpose

This script takes a deep dive into Landsat 9 labels for a more rigorous analysis of inconsistent band data and outliers in the filtered label dataset. Here we will determine if any more label data points should be removed from the training dataset and whether or not we can glean anything from the metadata in the outlier dataset to be able to pre-emptively toss out scenes when we go to apply the classification algorithm.

harmonize_version = "2024-04-25"
outlier_version = "2024-04-25"

LS9 <- read_rds(paste0("data/labels/harmonized_LS89_labels_", harmonize_version, ".RDS")) %>% 
  filter(mission == "LANDSAT_9")

Check for mis-matched band data between user data and re-pull

Just look at the data to see consistent (or inconsistent) user-pulled data and our pull, here, our user data are in “BX” format and the re-pull is in “SR_BX” format. These are steps to assure data quality if the volunteer didn’t follow the directions explicitly.

pmap(.l = list(user_band = LS89_user,
               ee_band = LS89_ee,
               data = list(LS9),
               mission = list("LANDSAT_9")),
     .f = make_band_comp_plot)
## [[1]]

## 
## [[2]]

## 
## [[3]]

## 
## [[4]]

## 
## [[5]]

## 
## [[6]]

There isn’t a ton of mis-match here, we’ll just use B7/SR_B7 as a reference to filter inconsistent labels

LS9_inconsistent <- LS9 %>% 
  filter(is.na(SR_B7) | B2 != SR_B2 | B3 != SR_B3 | 
           B4 != SR_B4 | B5 != SR_B5 | B6 != SR_B6 | B7 != SR_B7)

LS9_inconsistent %>% 
  group_by(class) %>% 
  summarise(n_labels = n()) %>% 
  kable()
class n_labels
cloud 3
darkNearShoreSediment 3
lightNearShoreSediment 5
offShoreSediment 3
other 3

Not much to report, we can just drop these few inconsistent band values.

LS9_filtered <- LS9 %>% 
  filter(# filter data where the repull data and user data match
         (B2 == SR_B2 & B3 == SR_B3 & 
           B4 == SR_B4 & B5 == SR_B5 & B6 == SR_B6 & B7 == SR_B7),
         # or where any re-pulled band value is greater than 1, which isn't a valid value
         if_all(LS89_ee,
                ~ . <= 1))

And plot:

## [[1]]

## 
## [[2]]

## 
## [[3]]

## 
## [[4]]

## 
## [[5]]

## 
## [[6]]

And now let’s look at the data by class:

## [[1]]

## 
## [[2]]

## 
## [[3]]

## 
## [[4]]

## 
## [[5]]

## 
## [[6]]

We aren’t actually modeling “other” (not sufficient observations to classify) or “shorelineContamination” (we’ll use this later to block areas where there is likely shoreline contamination in the AOI).

LS9_for_class_analysis <- LS9_filtered %>% 
  filter(!(class %in% c("other", "shorelineContamination")))
## [[1]]

## 
## [[2]]

## 
## [[3]]

## 
## [[4]]

## 
## [[5]]

## 
## [[6]]

Interesting - the classes look really similar in distribution (maybe because cloud categories are so high). It will be interesting to see if there are statistical differences.

Outlier handling

There are statistical outliers within this dataset and they may impact the interpretation of any statistical testing we do. Let’s see if we can narrow down when those outliers and/or glean anything from the outlier data that may be applicable to the the application of the algorithm. Outliers may be a systemic issue (as in the scene is an outlier), it could be a user issue (a user may have been a bad actor), or they just might be real. This section asks those questions. The “true outliers” that we dismiss from the dataset will also be used to help aid in interpretation/application of the algorithm across the Landsat stack, so it is important to make notes of any patterns we might see in the outlier dataset.

## [1] "Classes represented in outliers:"
## [1] "lightNearShoreSediment" "offShoreSediment"       "openWater"

Okay, 26 outliers (>1.5*IQR) out of 437 - and they are all from non-cloud groups, and none of them are dark near shore sediment.

Scene Level Metadata

How many of these outliers are in specific scenes?

LS9_out_date <- outliers %>% 
  group_by(date, vol_init) %>% 
  summarize(n_out = n())
LS9_date <- LS9_for_class_analysis %>% 
  filter(class != "cloud") %>% 
  group_by(date, vol_init) %>% 
  summarise(n_tot = n())
LS9_out_date <- left_join(LS9_out_date, LS9_date) %>% 
  mutate(percent_outlier = n_out/n_tot*100) %>% 
  arrange(-percent_outlier)
LS9_out_date %>% 
  kable()
date vol_init n_out n_tot percent_outlier
2022-06-06 KRW 7 17 41.176471
2022-05-21 MRB 12 82 14.634146
2021-11-06 KRW 3 28 10.714286
2022-05-05 HAD 4 113 3.539823

There is just one scene with a high proportion of ouliers - perhaps there is something about the AC in these particular scenes? or the general scene quality?

LS9_out_date %>% 
  filter(percent_outlier > 20) %>% 
  select(date, vol_init) %>% 
  left_join(., LS9) %>% 
  select(date, vol_init, DATA_SOURCE_AIR_TEMPERATURE:max_cloud_cover) %>% 
  distinct() %>% 
  kable()
date vol_init DATA_SOURCE_AIR_TEMPERATURE DATA_SOURCE_ELEVATION DATA_SOURCE_OZONE DATA_SOURCE_PRESSURE DATA_SOURCE_REANALYSIS DATA_SOURCE_WATER_VAPOR NADIR_OFFNADIR CLOUD_COVER_list IMAGE_QUALITY_OLI_list IMAGE_QUALITY_TIRS_list mean_cloud_cover max_cloud_cover
2022-06-06 KRW MODIS GLS2000 MODIS Calculated GEOS-5 FP-IT MODIS NADIR 53.08; 27.14 9 9 40.11 53.08

Image quality is high and cloud cover is reasonable. Let’s look at that image:

This does indeed look a bit hazy, but there are some really interesting offshore sediment plumes here. Hard to tell if the color is in any way due to aerosol contamination, though. (We’ll look into that later.)

Let’s look at instances where outliers are in at least three bands for a given label:

date class vol_init user_label_id n_bands_out bands_out
2022-06-06 lightNearShoreSediment KRW 1141 3 SR_B5; SR_B6; SR_B7
2022-06-06 offShoreSediment KRW 1152 3 SR_B2; SR_B5; SR_B6

These are from the same scene. Let’s see if it shows up in our QA summaries.

Clouds

How many of these outliers have near-pixel clouds (as measured by ST_CDIST)?

There are 26 labels (100% of oultiers) that aren’t “cloud” in the outlier dataset that have a cloud distance <500m and 244 labels (55.8%) in the whole dataset that have a cloud distance <500m. This is unhelpful.

How many of the outliers have high cloud cover, as reported by the scene-level metadata? Note, we don’t have the direct scene cloud cover associated with individual labels, rather a list of the scene level cloud cover values associated with the AOI.

The outlier dataset contains 12 (46.2%) where the max cloud cover was > 75% and 12 (46.2%) where the mean cloud cover was > 50%. The filtered dataset contains 82 (18.8%) where max was >75% and 82 (18.8%) where the mean cloud cover was > 50%. There is definitely a higher proportion of labels in the outlier group that have higher cloud cover. We might be able to pull this out with the aerosol analysis, too.

QA Pixels

Do any of the labels have QA pixel indications of cloud or cloud shadow? The first pass here is for all data that don’t have a label of “cloud” (not just outliers). We’ll use the high certainty mask here, as low certainty is too greedy.

LS9_for_class_analysis %>% 
  mutate(QA = case_when(cirrus_conf >= 3 ~ "cirrus",
                        cloud_conf >= 3 ~ "cloud",
                        cloudshad_conf >= 3 ~ "cloud shadow",
                        snowice_conf >= 3 ~ "snow/ice",
                        TRUE ~ "clear")) %>% 
  group_by(QA) %>% 
  filter(class != "cloud") %>% 
  summarize(n_tot = n()) %>% 
  kable()
QA n_tot
cirrus 7
clear 233
cloud shadow 3
snow/ice 1

Let’s look at the cirrus group to see if there is anything egregious:

LS9_tot <- LS9_for_class_analysis %>% 
  group_by(vol_init, date) %>% 
  summarise(n_tot_labels = n())

LS9_for_class_analysis %>% 
  mutate(QA = case_when(cirrus_conf >= 3 ~ "cirrus",
                        cloud_conf >= 3 ~ "cloud",
                        cloudshad_conf >= 3 ~ "cloud shadow",
                        snowice_conf >= 3 ~ "snow/ice",
                        TRUE ~ "clear")) %>% 
  filter(class != "cloud", QA != "clear") %>% 
  group_by(date, vol_init) %>% 
  summarise(n_qa_flag = n()) %>% 
  left_join(LS9_tot) %>% 
  mutate(perc_qa_flag = round(n_qa_flag/n_tot_labels*100, 1)) %>% 
  arrange(-perc_qa_flag) %>% 
  kable()
date vol_init n_qa_flag n_tot_labels perc_qa_flag
2022-05-05 HAD 7 113 6.2
2022-05-21 MRB 3 126 2.4
2021-11-06 KRW 1 65 1.5

Let’s look at this scene - there are 8 labels (that aren’t clouds) that have a cirrus flag:

This is quite hazy in the SE corner of the AOI, near the Apostles. While we can keep this scene since it is otherwise clear, we will again trust the QA bit to remove high aerosol pixels.

Aerosol QA bit

Landsat 9 and 9 feature an Aerosol QA band, derived from Band 1. We should look through the data here to see if any of the labels are in high aerosol QA pixels, which the USGS suggests should not be used.

LS9_for_class_analysis %>% 
  filter(aero_level == 3, class != "cloud") %>% 
  group_by(vol_init, date) %>%
  summarise(n_high_aero_labels = n()) %>% 
  left_join(LS9_tot) %>% 
  mutate(perc_high_aero = round(n_high_aero_labels/n_tot_labels*100, 1)) %>% 
  arrange(-perc_high_aero) %>% 
  kable()
vol_init date n_high_aero_labels n_tot_labels perc_high_aero
HAD 2022-05-05 67 113 59.3
MRB 2022-05-21 20 126 15.9
KRW 2022-06-06 9 57 15.8

We already know that the scenes 2022-05-05 and 2022-06-06 have been flagged, let’s also look at 2022-05-21:

Well, some of the scene is pretty clear, but there is some of the green-cloud artifact of high aerosol evident in this scene.

I think we can trust the QA bits here.

Training dataset implications

For the purposes of training data, we are going to throw out the the high aerosol and any QA flagged label, as well as the labels from that May 2022 image with a high proportion of aerosol contamination.

LS9_training_labels <- LS9_for_class_analysis %>% 
  filter(date != "2022-05-05" | 
           class == "cloud" |
           (aero_level < 3 &
              cirrus_conf < 3 &
              cloud_conf < 3 &
              cloudshad_conf < 3 & 
              snowice_conf < 3))

Testing for inter-class differences

We do want to have an idea of how different the classes are, in regards to band data. While there are a bunch of interactions that we could get into here, for the sake of this analysis, we are going to analyze the class differences by band.

Kruskal-Wallis assumptions:

  1. Data are non-Normal or have a skewed distribution
  2. There must be at least two independent groups.
  3. Data have a similar distribution across groups.
  4. Data are independent, the groups shouldn’t have a relationship to one another
  5. Each group should contain at least 5 observations

ANOVA assumptions:

  1. data are distributed normally
  2. data are independent
  3. variance across groups is similar

We can’t entirely assert sample independence and we know that variance and distribution is different for “cloud” labels, but those data also are visibly different from the other classes.

In order to systematically test for differences between classes and be able to intepret the data, we will need to know some things about our data:

  1. Are the data normally distributed (Shapiro-Wilkes)?
  2. Are there outliers that may impact interpretation?
  3. If data is non-normal, perform Kruskal-Walis test; otherwise ANOVA
  4. if the null is rejected (and there is a difference in at least one class), perform post-hoc test for pairwise comparison (Dunn test for both)

With this workflow, most classes are statistically different - below are the cases where the pairwise comparison were not deemed statistically significant:

## # A tibble: 23 × 9
##    band  group1         group2    n1    n2 statistic       p  p.adj p.adj.signif
##    <chr> <chr>          <chr>  <int> <int>     <dbl>   <dbl>  <dbl> <chr>       
##  1 SR_B2 darkNearShore… light…    29    64     2.73  0.00626 0.0626 ns          
##  2 SR_B2 darkNearShore… offSh…    29    65    -0.757 0.449   1      ns          
##  3 SR_B2 darkNearShore… openW…    29    19    -1.79  0.0740  0.740  ns          
##  4 SR_B2 offShoreSedim… openW…    65    19    -1.37  0.170   1      ns          
##  5 SR_B3 darkNearShore… light…    29    64     2.28  0.0228  0.228  ns          
##  6 SR_B3 darkNearShore… offSh…    29    65    -1.58  0.113   1      ns          
##  7 SR_B3 offShoreSedim… openW…    65    19    -2.22  0.0263  0.263  ns          
##  8 SR_B4 darkNearShore… light…    29    64     1.55  0.122   1      ns          
##  9 SR_B4 offShoreSedim… openW…    65    19    -1.85  0.0644  0.644  ns          
## 10 SR_B5 darkNearShore… light…    29    64     1.09  0.275   1      ns          
## # ℹ 13 more rows

A little better than LS8, arguably? still an issue with dark near shore sediment, but pretty good other than that.

Let’s look at the boxplots of the non-cloud classes:

LS9_training_labels_no_clouds <- LS9_training_labels %>% 
  filter(class != "cloud")
pmap(.l = list(data = list(LS9_training_labels_no_clouds),
               data_name = list("LANDSAT_5"),
               band = LS89_ee),
     .f = make_class_comp_plot)
## [[1]]

## 
## [[2]]

## 
## [[3]]

## 
## [[4]]

## 
## [[5]]

## 
## [[6]]

There are 2 in SR_B2, 0 in SR_B3, 0 in SR_B4, 12 in SR_B5, 7 outliers in SR_B6, and 3 in SR_B7.

DNSS: dark near shore sediment, LNSS: light near shore sediment, OSS: offshore sediment

DNSS: dark near shore sediment, LNSS: light near shore sediment, OSS: offshore sediment

There are definitely some varying patterns here, let’s zoom in on the sediment classes.

DNSS: dark near shore sediment, LNSS: light near shore sediment, OSS: offshore sediment

DNSS: dark near shore sediment, LNSS: light near shore sediment, OSS: offshore sediment

Arguably not as scatter shot bad as LS8, but still might be tough to distinguish between classes.

Export the training labels

Things to note for Landsat 9:

  • bright cloud cover and snow may impact Rrs within the waterbody leading to outliers. will need to be cautious applying the algo when snow is on the ground!
  • must mask high aerosol pixels, as they will get labeled as something else entirely.
  • pixels with QA flags should be dismissed from model application
write_rds(LS9_training_labels, paste0("data/labels/LS9_labels_for_tvt_", outlier_version, ".RDS"))